RTS Model V2 Performance Analysis

TODO:

  • Possible to remove extra influence of multiple RTS within one tile?
  • Figure out why my IoU scores don’t quite match Yili’s - Possible that this is being calculated on a different set of polygons? Still not sure if I need to remove classes that are missing from predictions.
  • Decide whether to use best RTS threshold or 0.5
  • Figure out some sort of explanation method to figure out which cells were most important to predictions? Does this even make sense in a context where we get probabilities out of the model rather than a label?
  • try a pca on the zonal statistics dataset

Set-Up

Load Libraries

Prep Google Drive Authentication

Define Functions

assign_conf_stars

avg_precision

bbox

calc_rts_threshold

eval_expression

filter_input_data

get_rast_id

googledrive_download

input_as_df

import_pred

lm_contrasts

lm_summary

make_my_dir

my_lm

plot_prediction

plot_zonal_stats

pred_as_poly

recall_precision

trim_outliers

val_as_poly

Prep Plot Variables

Load Data

Polygons

## Reading layer `rts_polygons_for_Yili_May_2022_v2' from data source 
##   `/home/hrodenhizer/Documents/permafrost_pathways/rts_mapping/rts_data_comparison/data/rts_polygons/rts_polygons_for_Yili_May_2022_v2.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 138 features and 11 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: -139.218 ymin: 68.99102 xmax: 124.304 ymax: 72.99794
## Geodetic CRS:  WGS 84

Download Files

## [1] "There are 69 prediction tiles."
## [1] "There are 138 input data tiles."

Maxar GeoTiffs

Planet GeoTiffs

Sentinel GeoTiffs

Get Tile Bounding Boxes

Convert Predictions to Vector

Calculate Best RTS Threshold

Join Prediction Polygons into polys SF Dataframe

Convert Validation to Vector

Join Validation Polygons into polys SF Dataframe

Convert Input Data to DF

Interactive Map of Features

IoU

Calculate Intersection and Union

Calculate IoU

IoU (Calculated in Model)

IoU (Calculated Directly from Predictions)
Imagery (normalization method) Test IoU Val IoU
Maxar (0-1) 0.75 0.69
Maxar (z-score) 0.73 0.72
Planet (0-1) 0.71 0.70
Sentinel-2 (0-1) 0.68 0.66
Imagery Val IoU Val IoU (remove zeros)
Maxar 0.67 0.73
Planet 0.64 0.71
Sentinel-2 0.61 0.67

Mean Average Precision

Recall-Precision Curves

MAP

Plot

Precision measures false positives. Recall measures false negatives.

This figure will only be interesting once we have added in negative training data.

Performance by Feature Size

Calculate Area

Size Distribution by Region

## Warning: attribute variables are assumed to be spatially constant throughout
## all geometries

yg mean_size min_size max_size median_size
Other 20496.33 484 107280 10380
Yamal/Gydan 11409.41 512 47696 6204

Plot

Raw IoU Scores:

This is complicated by the fact that the rts_area column is calculated from the raster validation layer, which may contain several RTS features within one tile. Use rts_area (from the original RTS delineation), instead.

Run nls models and bootstrap parameters

Bootstrap predictions for plotting the nls models

Plot the Size/Performance plot

Active vs. General RTS Performance

imagery p_val x_pos star_y_pos label_y_pos p_label star_label
Maxar 0.1325867 1.5 0.9 0.95 p-value = 0.133
Planet 0.7955779 1.5 0.9 0.95 p-value = 0.796
Sentinel-2 0.5639142 1.5 0.9 0.95 p-value = 0.564

Plot

It is possible to get rid of the inner panel borders, if I decide that looks better: https://stackoverflow.com/questions/46220242/ggplot2-outside-panel-border-when-using-facet

Drivers of Unexpected RTS Prediction Performance

Classify Features Using Confidence Interval Approach

This approach first uses the 95% CI of the model parameters to determine whether RTS features were predicted better or worse than expected based on the model. Next, the threshold at which RTS size doesn’t impact IoU is determined from where the slope of the model approaches 0 (currently using slope < 1e-06) for each imagery type. RTS features smaller than this threshold that were predicted better than expected are analyzed later to determine why some small RTS can be identified from the imagery.

imagery rts_threshold
Maxar 12616.8
Planet 14047.3
Sentinel-2 12832.0
imagery High Expected Low
Maxar 13 51 5
Planet 13 48 8
Sentinel-2 10 46 13

Visualize the RTS predictions

Is performance class the same across imagery sources for most RTS features?

Zonal Statistics

These plots summarize the input data values (mean or standard deviation) in RTS cells, background cells, and the normalized difference between the two (Delta = RTS - Background). Most of the input layer names should be self explanatory, but for the others:

lum = luminance = 0.299*r + 0.587*g + 0.114*b

sr = shaded relief

A few takeaway points:

  • NIR and NDVI (mean) figures indicate that RTS are predicted poorly where background cells have low NDVI/plant growth (plants reflect NIR, so have a higher NIR value). I.e. it’s harder for the model to find the RTS where there’s not much plant growth across the landscape.

  • NDWI (mean) figure indicates that RTS features are predicted better where it’s drier (fewer lakes?).

  • The standard deviation of elevation (surface roughness) figure indicates that where there is higher variability in elevation within RTS relative to the background cells, RTS features are identified more easily. I.e. smooth landscape with uneven RTS is easy for the model to find.

I didn’t find much interesting about the rest of these figures, but they are included here for completeness:

Prediction Probability

NIR/NDVI

NDWI

Elevation

Others

PCA Analysis

PCA - Prediction quality color scheme

PCA - RTS probability color scheme

PCA - IoU color scheme

PCA - RTS area color scheme

Read in Models from Google Drive